Discovering Relational Item Sets Efficiently

نویسندگان

  • Arne Koopman
  • Arno Siebes
چکیده

Frequent item set mining is a major data mining research area. Generalising from the standard single table case to a multirelational setting is simple in principle, but hard in practice. That is, it is simple to define frequent item sets in the multirelational setting, as well as extending the A-Priori algorithm. It is hard, because the well-known frequent pattern explosion at low min-sup settings is far worse than it is in the standard case. In this paper we introduce an effective algorithm for the discovery of frequent, multi-relational item sets. These relational patterns show which item sets occur together. Answering questions like: ‘What type of Books are bought together with what Record types?’. Hence, they provide a symmetric insight in the relation and reveal patterns that are relevant with respect to the relation. It extends our earlier work on using MDL to discover a small set of characteristic item sets. The algorithm, R-KRIMP, first discovers the small set of characteristic patterns in the single tables and then combines these to find a small set of characteristic multi-relational item sets. This reduces the original search space dramatically and, hence, brings down the computational complexity by orders of magnitude. In the experiments we show that this approach yields a very good approximation of the naive approach, joining all tables into one huge table, while being far more efficient.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Similar Item Sets Of Temporal Databases Using Spamine Algorithm

Data mining is the process of extracting interesting like non-trivial, implicit, previously unknown and potentially useful information or patterns from large information repositories such as: relational database, data warehouses, XML repository, etc. Data mining is known as one of the core processes of Knowledge Discovery in Database (KDD). Association rule mining is a popular and well research...

متن کامل

DWMiner: A Tool for Mining Frequent Item Sets Efficiently in Data Warehouses

This work presents DWMiner, an association rules efficient mining tool to process data directly over a relational DBMS data warehouse. DWMiner executes the Apriori algorithm as SQL queries in parallel, using a database PC Cluster middleware developed for SQL query optimization in OLAP applications. DWMiner combines intraand inter-query parallelism in order to reduce the total time needed to fin...

متن کامل

Efficient Utility Based Infrequent Weighted Item-Set Mining

Association Rule Mining (ARM) is one of the most popular data mining techniques. Most of the past work is based on frequent item-set. In current years, the concentration of researchers has been focused on infrequent item-set mining. The infrequent item-set mining problem is discovering item-sets whose frequency of the data is less than or equal to maximum threshold. This paper addresses the min...

متن کامل

An Rule Based Mining Database with Similarityon Large Probabilistic Graph Matching

-Mining frequent itemsets is an active area in data mining that aims at searching interesting relationships between items in databases. It can be used to address to a wide variety of problems such as discovering association rules, sequential patterns, correlations and much more. Existing methods often generate a huge set of potential high utility item sets and their mining performance is degrad...

متن کامل

An Efficient Frequent Pattern Mining Algorithm to Find the Existence of K-Selective Interesting Patterns in Large Dataset Using SIFPMM

Association rule mining in huge database is one of most popular data exploration technique for business decision makers. Discovering frequent item set is the fundamental process in association rule mining. Several algorithms were introduced in the literature to find frequent patterns. Those algorithms discover all combinations of frequent item sets for a given minimum support threshold. But som...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008